NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Long-Form Answers to Visual Questions from Blind and Low Vision People

Huh, Mina; Xu, Fangyuan; Peng, Yi-Hao; Chen, Congyan; Murugu, Hansika; Gurari, Danna; Choi, Eunsol; Pavel, Amy (October 2024, Conference on Language Modeling)

Vision language models can now generate long-form answers to questions about images -- long-form visual question answers (LFVQA). We contribute VizWiz-LF, a dataset of long-form answers to visual questions posed by blind and low vision (BLV) users. VizWiz-LF contains 4.2k long-form answers to 600 visual questions, collected from human expert describers and six VQA models. We develop and annotate functional roles of sentences of LFVQA and demonstrate that long-form answers contain information beyond the question answer such as explanations and suggestions. We further conduct automatic and human evaluations with BLV and sighted people to evaluate long-form answers. BLV people perceive both human-written and generated long-form answers to be plausible, but generated answers often hallucinate incorrect visual details, especially for unanswerable visual questions (e.g., blurry or irrelevant images). To reduce hallucinations, we evaluate the ability of VQA models to abstain from answering unanswerable questions across multiple prompting strategies.
more » « less
Full Text Available
GeoLatent: A Geometric Approach to Latent Space Design for Deformable Shape Generators

https://doi.org/10.1145/3618371

Yang, Haitao; Sun, Bo; Chen, Liyan; Pavel, Amy; Huang, Qixing (December 2023, ACM Transactions on Graphics)

We study how to optimize the latent space of neural shape generators that map latent codes to 3D deformable shapes. The key focus is to look at a deformable shape generator from a differential geometry perspective. We define a Riemannian metric based on as-rigid-as-possible and as-conformal-as-possible deformation energies. Under this metric, we study two desired properties of the latent space: 1) straight-line interpolations in latent codes follow geodesic curves; 2) latent codes disentangle pose and shape variations at different scales. Strictly enforcing the geometric interpolation property, however, only applies if the metric matrix is a constant. We show how to achieve this property approximately by enforcing that geodesic interpolations are axis-aligned, i.e., interpolations along coordinate axis follow geodesic curves. In addition, we introduce a novel approach that decouples pose and shape variations via generalized eigendecomposition. We also study efficient regularization terms for learning deformable shape generators, e.g., that promote smooth interpolations. Experimental results on benchmark datasets show that our approach leads to interpretable latent codes, improves the generalizability of synthetic shapes, and enhances performance in geodesic interpolation and geodesic shooting.
more » « less
Full Text Available
Tech Help Desk: Support for Local Entrepreneurs Addressing the Long Tail of Computing Challenges

https://doi.org/10.1145/3491102.3517708

Kotturi, Yasmine; Johnson, Herman T; Skirpan, Michael; Fox, Sarah E; Bigham, Jeffrey P; Pavel, Amy (April 2022, CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems)

Full Text Available
Controlling Dialogue Generation with Semantic Exemplars

https://doi.org/10.18653/v1/2021.naacl-main.240

Gupta, Prakhar; Bigham, Jeffrey; Tsvetkov, Yulia; Pavel, Amy (January 2021, The 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL))

Full Text Available
Co-designing Socially Assistive Sidekicks for Motion-based AAC

https://doi.org/10.1145/3434073.3444646

Valencia, Stephanie; Luria, Michal; Pavel, Amy; Bigham, Jeffrey P.; Admoni, Henny (March 2021, HRI '21: Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction)
null (Ed.)
Augmentative and alternative communication (AAC) devices enable speech-based communication. However, AAC devices do not support nonverbal communication, which allows people to take turns, regulate conversation dynamics, and express intentions. Nonverbal communication requires motion, which is often challenging for AAC users to produce due to motor constraints. In this work, we explore how socially assistive robots, framed as ''sidekicks,'' might provide augmented communicators (ACs) with a nonverbal channel of communication to support their conversational goals. We developed and conducted an accessible co-design workshop that involved two ACs, their caregivers, and three motion experts. We identified goals for conversational support, co-designed prototypes depicting possible sidekick forms, and enacted different sidekick motions and behaviors to achieve speakers' goals. We contribute guidelines for designing sidekicks that support ACs according to three key parameters: attention, precision, and timing. We show how these parameters manifest in appearance and behavior and how they can guide future designs for augmented nonverbal communication.
more » « less
Full Text Available

Search for: All records